Polis is an open source wiki-survey platform for rapid, scalable, open ended feedback, in which participants submit short comments which are sent out semi-randomly to other participants to vote on (by clicking agree, disagree or pass). Polis uses statistical algorithms to find patterns of consensus and opinion groups.
Of the raw data collected, we have:
| Participants | Groups | Commenters | Comments | Votes | Agrees | Disagrees | Votes / participant (avg) |
|---|---|---|---|---|---|---|---|
| 502 | 4 | 205 | 441 | 23062 | 16608 | 4362 | 45.94 |
After removing moderated out comments, and participants who voted on fewer than 7 comments, we have:
| Participants | Groups | Commenters | Comments | Votes | Agrees | Disagrees | Votes / participant (avg) |
|---|---|---|---|---|---|---|---|
| 405 | 4 | 38 | 145 | 21724 | 15434 | 4282 | 53.63 |
Here we can see the distribution of these votes and comments over time as the conversation unfolded.
Next, we'll take a look at the variance in the data by plotting comments according to the number of agrees and disagrees. This data is plotted in a log plot due to the highly skewed nature of vote count distribution per comment. The red line separates comments which were predominantly agreed with (bottom right) from those predominantly disagreed with (bottom left).
Note that comments with far more disagrees than agrees had overall much lower vote counts. This is a direct result of the comment routing architecture of Polis, which deprioritizes comments which most people disagree with.
We can take these votes and arrange them into a matrix, where rows correspond to participants and columns correspond to statements. This allows us to think of participants as having positions in high dimensional space (dimensionality equal to the number of comments).
While the above visualization may be impressive, it's not particularly useful as far as understanding how participants opinions relate to each other. To better understand this, we can apply a dimensionality reduction algorithm, which allows us to capture as much of the variance within the data as we can within a lower dimensional space. Specifically, reducing to 2-dimensions allows us to plot participants locations in relation to each other in an opinion space, where participants are close together if they tend to agree, and further apart if they tend to disagree. Here, we're also coloring according to a K-means clustering of the participants into opinion groups, which lets us ask questions about what's important to different groups, and better understand the opinion landscape.
Below, we can see the proportion of total variance explained by the x and y axes (the first two principal components) in the plot above:
[0.11246402580835359 0.05696933229182148]
The comments most strongly correlated with position along the X-axis:
The comments most strongly correlated with position along the X-axis:
The comments with highest overall pc loading between pcs 1 and 2
The most agreed on comments:
Comments representative of group 0
Comments representative of group 1
Comments representative of group 2
Comments representative of group 3